NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Adversarial Perturbations Are Formed by Iteratively Learning Linear Combinations of the Right Singular Vectors of the Adversarial Jacobian

Paniagua, Thomas; Savadikar, Chinmay; Wu, Tianfu (May 2025, Proceedings of the 42 nd International Conference on Machine Learning, Vancouver, Canada. PMLR 267, 2025)

White-box targeted adversarial attacks reveal core vulnerabilities in Deep Neural Networks (DNNs), yet two key challenges persist: (i) How many target classes can be attacked simultaneously in a specified order, known as the ordered top-K attack problem (K ≥ 1)? (ii) How to compute the corresponding adversarial perturbations for a given benign image directly in the image space? We address both by showing that ordered top-K perturbations can be learned via iteratively optimizing linear combinations of the right singular vectors of the adversarial Jacobian (i.e., the logit-to-image Jacobian constrained by target ranking). These vectors span an orthogonal, informative subspace in the image domain. We introduce RisingAttacK, a novel Sequential Quadratic Programming (SQP)-based method that exploits this structure. We propose a holistic figure-of-merits (FoM) metric combining attack success rates (ASRs) and ℓp-norms (p = 1, 2, ∞). Extensive experiments on ImageNet-1k across six ordered top-K levels (K = 1, 5, 10, 15, 20, 25, 30) and four models (ResNet-50, DenseNet-121, ViTB, DEiT-B) show RisingAttacK consistently surpasses the state-of-the-art QuadAttacK.
more » « less
Full Text Available
QuadAttacK: A Quadratic Programming Approach to Learning Ordered Top-K Adversarial Attacks

Paniagua, Thomas; Grainger, Ryan; Wu, Tianfu (December 2023, 37th Conference on Neural Information Processing Systems (NeurIPS 2023))

The adversarial vulnerability of Deep Neural Networks (DNNs) has been wellknown and widely concerned, often under the context of learning top-1 attacks (e.g., fooling a DNN to classify a cat image as dog). This paper shows that the concern is much more serious by learning significantly more aggressive ordered top-K clearbox 1 targeted attacks proposed in [Zhang and Wu, 2020]. We propose a novel and rigorous quadratic programming (QP) method of learning ordered top-K attacks with low computing cost, dubbed as QuadAttacK. Our QuadAttacK directly solves the QP to satisfy the attack constraint in the feature embedding space (i.e., the input space to the final linear classifier), which thus exploits the semantics of the feature embedding space (i.e., the principle of class coherence). With the optimized feature embedding vector perturbation, it then computes the adversarial perturbation in the data space via the vanilla one-step back-propagation. In experiments, the proposed QuadAttacK is tested in the ImageNet-1k classification using ResNet-50, DenseNet-121, and Vision Transformers (ViT-B and DEiT-S). It successfully pushes the boundary of successful ordered top-K attacks from K = 10 up to K = 20 at a cheap budget (1 × 60) and further improves attack success rates for K = 5 for all tested models, while retaining the performance for K = 1.
more » « less
Full Text Available
PaCa-ViT: Learning Patch-to-Cluster Attention in Vision Transformers

https://doi.org/10.1109/CVPR52729.2023.01781

Grainger, Ryan; Paniagua, Thomas; Song, Xi; Cuntoor, Naresh; Lee, Mun Wai; Wu, Tianfu (June 2023, IEEE)

Vision Transformers (ViTs) are built on the assumption of treating image patches as “visual tokens” and learn patch-to-patch attention. The patch embedding based tokenizer has a semantic gap with respect to its counterpart, the textual tokenizer. The patch-to-patch attention suffers from the quadratic complexity issue, and also makes it non-trivial to explain learned ViTs. To address these issues in ViT, this paper proposes to learn Patch-to-Cluster attention (PaCa) in ViT. Queries in our PaCa-ViT starts with patches, while keys and values are directly based on clustering (with a predefined small number of clusters). The clusters are learned end-to-end, leading to better tokenizers and inducing joint clustering-for-attention and attention-for-clustering for better and interpretable models. The quadratic complexity is relaxed to linear complexity. The proposed PaCa module is used in designing efficient and interpretable ViT backbones and semantic segmentation head networks. In experiments, the proposed methods are tested on ImageNet-1k image classification, MS-COCO object detection and instance segmentation and MIT-ADE20k semantic segmentation. Compared with the prior art, it obtains better performance in all the three benchmarks than the SWin [32] and the PVTs [47], [48] by significant margins in ImageNet-1k and MIT-ADE20k. It is also significantly more efficient than PVT models in MS-COCO and MIT-ADE20k due to the linear complexity. The learned clusters are semantically meaningful. Code and model checkpoints are available at https:/github.com/iVMCL/PaCaViT.
more » « less
Full Text Available

Search for: All records